12 research outputs found
Recommended from our members
Deep Structured Multi-Task Learning for Computer Vision in Autonomous Driving
The field of computer vision is currently dominated by deep learning advances. Convolutional
Neural Networks (CNNs) have become the predominant tool for solving almost any computer
vision task, so state-of-the-art systems have been built by using the predictive capabilities of
Convolutional Neural Networks (CNNs). Many of those systems use simple encoder–decoder
based design, where an off-the-shelf CNN architecture is combined with a task-specific
decoder and loss function in order to create an end-to-end trainable model. This ultimately
raises the question of whether these kinds of models are the future of computer vision.
In this thesis we argue that this is not the case. We start off by discussing three limitations
of simple end-to-end training. We proceed by showing how it is possible to overcome those
limitations by using an approach that we call structured modelling. The idea is to use CNNs
to compute a rich semantic intermediate representation which is then used to solve the actual
problem by applying a geometric and task-related structure.
In this work we solve the localization, segmentation and landmark recognition task
using structured modelling, and we show that this approach can improve generalization,
interpretability and robustness. We also discuss how this approach is particularly useful
for real-time applications such as autonomous driving. Visual perception is a multi-module
problem that requires several different computer vision tasks to be solved. We discuss how,
by sharing computations, we can improve not only the inference speed but also the prediction
performance by using the structural relationship between the tasks. Lastly, we demonstrate
that structured modelling is able to achieve state-of-the-art performance, making it a very
relevant approach for solving current and future computer vision problems.Trinity College, ESPCR, Qualcom
End-to-end Learning for Image-based Detection of Molecular Alterations in Digital Pathology
Current approaches for classification of whole slide images (WSI) in digital
pathology predominantly utilize a two-stage learning pipeline. The first stage
identifies areas of interest (e.g. tumor tissue), while the second stage
processes cropped tiles from these areas in a supervised fashion. During
inference, a large number of tiles are combined into a unified prediction for
the entire slide. A major drawback of such approaches is the requirement for
task-specific auxiliary labels which are not acquired in clinical routine. We
propose a novel learning pipeline for WSI classification that is trainable
end-to-end and does not require any auxiliary annotations. We apply our
approach to predict molecular alterations for a number of different use-cases,
including detection of microsatellite instability in colorectal tumors and
prediction of specific mutations for colon, lung, and breast cancer cases from
The Cancer Genome Atlas. Results reach AUC scores of up to 94% and are shown to
be competitive with state of the art two-stage pipelines. We believe our
approach can facilitate future research in digital pathology and contribute to
solve a large range of problems around the prediction of cancer phenotypes,
hopefully enabling personalized therapies for more patients in future.Comment: MICCAI 2022; 8.5 Pages, 4 Figure
Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery
Intraoperative segmentation and tracking of minimally invasive instruments is
a prerequisite for computer- and robotic-assisted surgery. Since additional
hardware like tracking systems or the robot encoders are cumbersome and lack
accuracy, surgical vision is evolving as promising techniques to segment and
track the instruments using only the endoscopic images. However, what is
missing so far are common image data sets for consistent evaluation and
benchmarking of algorithms against each other. The paper presents a comparative
validation study of different vision-based methods for instrument segmentation
and tracking in the context of robotic as well as conventional laparoscopic
surgery. The contribution of the paper is twofold: we introduce a comprehensive
validation data set that was provided to the study participants and present the
results of the comparative validation study. Based on the results of the
validation study, we arrive at the conclusion that modern deep learning
approaches outperform other methods in instrument segmentation tasks, but the
results are still not perfect. Furthermore, we show that merging results from
different methods actually significantly increases accuracy in comparison to
the best stand-alone method. On the other hand, the results of the instrument
tracking task show that this is still an open challenge, especially during
challenging scenarios in conventional laparoscopic surgery
Gate-Controlled Ionization and Screening of Cobalt Adatoms on a Graphene Surface
We describe scanning tunneling spectroscopy (STS) measurements performed on
individual cobalt (Co) atoms deposited onto backgated graphene devices. We find
that Co adatoms on graphene can be ionized by either the application of a
global backgate voltage or by the application of a local electric field from a
scanning tunneling microscope (STM) tip. Large screening clouds are observed to
form around Co adatoms ionized in this way, and we observe that some intrinsic
graphene defects display a similar behavior. Our results provide new insight
into charged impurity scattering in graphene, as well as the possibility of
using graphene devices as chemical sensors.Comment: 19 pages, 4 figure
Recommended from our members
Detect-to-Retrieve: Efficient Regional Aggregation for Image Search
Retrieving object instances among cluttered sceneefficiently requires compact yet comprehensive regionaimage representations. Intuitively, object semantics cahelp build the index that focuses on the most relevanregions. However, due to the lack of bounding-box datasefor objects of interest among retrieval benchmarks, morecent work on regional representations has focused oeither uniform or class-agnostic region selection. In thpaper, we first fill the void by providing a new dataset olandmark bounding boxes, based on the Google Landmarkdataset, that includes 86k images with manually curateboxes from 15k unique landmarks. Then, we demonstrahow a trained landmark detector, using our new datasecan be leveraged to index image regions and improvretrieval accuracy while being much more efficient thaexisting regional methods. In addition, we introduce novel regional aggregated selective match kernel (R-ASMKto effectively combine information from detected regioninto an improved holistic image representation. R-ASMboosts image retrieval accuracy substantially with ndimensionality increase, while even outperforming systemthat index image regions independently. Our complete imagretrieval system improves upon the previous state-of-the-aby significant margins on the Revisited Oxford and Pardatasets. Code and data available at the project webpaghttps://github.com/tensorflow/models/
tree/master/research/delf